27 research outputs found

    Performance Optimization on big.LITTLE Architectures:A Memory-latency Aware Approach

    Get PDF
    The energy demands of modern mobile devices have driven a trend towards heterogeneous multi-core systems which include various types of core tuned for performance or energy efficiency, offering a rich optimization space for software. On such systems, data coherency between cores is automatically ensured by an interconnect between processors. On some chip designs the performance of this interconnect, and by extension of the entire CPU cluster, is highly dependent on the software's memory access characteristics and on the set of frequencies of each CPU core. Existing frequency scaling mechanisms in operating systems use a simple load-based heuristic to tune CPU frequencies, and so fail to achieve a holistically good configuration across such diverse clusters. We propose a new adaptive governor to solve this problem, which uses a simple trained hardware model of cache interconnect characteristics, along with real-time hardware monitors, to continually adjust core frequencies to maximize system performance. We evaluate our governor on the Exynos5422 SoC, as used in the Samsung Galaxy S5, across a range of standard benchmarks. This shows that our approach achieves a speedup of up to 40%, and a 70% energy saving, including a 30% speedup in common mobile applications such as video decoding and web browsing

    Learning-based run-time power and energy management of multi/many-core systems: current and future trends

    Get PDF
    Multi/Many-core systems are prevalent in several application domains targeting different scales of computing such as embedded and cloud computing. These systems are able to fulfil the everincreasing performance requirements by exploiting their parallel processing capabilities. However, effective power/energy management is required during system operations due to several reasons such as to increase the operational time of battery operated systems, reduce the energy cost of datacenters, and improve thermal efficiency and reliability. This article provides an extensive survey of learning-based run-time power/energy management approaches. The survey includes a taxonomy of the learning-based approaches. These approaches perform design-time and/or run-time power/energy management by employing some learning principles such as reinforcement learning. The survey also highlights the trends followed by the learning-based run-time power management approaches, their upcoming trends and open research challenges

    Energy efficient run-time mapping and thread partitioning of concurrent OpenCL applications on CPU-GPU MPSoCs

    Get PDF
    Heterogeneous Multi-Processor Systems-on-Chips (MPSoCs) containing CPU and GPU cores are typically required to execute applications concurrently. However, as will be shown in this paper, existing approaches are not well suited for concurrent applications as they are developed either by considering only a single application or they do not exploit both CPU and GPU cores at the same time. In this paper, we propose an energy-efficient run-time mapping and thread partitioning approach for executing concurrent OpenCL applications on both GPU and GPU cores while satisfying performance requirements. Depending upon the performance requirements, for each concurrently executing application, the mapping process finds the appropriate number of CPU cores and operating frequencies of CPU and GPU cores, and the partitioning process identifies an efficient partitioning of the applications’ threads between CPU and GPU cores. We validate the proposed approach experimentally on the Odroid-XU3 hardware platform with various mixes of applications from the Polybench benchmark suite. Additionally, a case-study is performed with a real-world application SLAMBench. Results show an average energy saving of 32% compared to existing approaches while still satisfying the performance requirements

    Компьютерное сопровождение учебного процесса

    Get PDF
    Thermal cycling as well as temperature gradient in time and space affects the lifetime reliability and performance of heterogeneous multiprocessor systems-on-chips (MPSoCs). Conventional temperature management techniques are not intelligent enough to cater for performance, energy efficiency as well as operating temperature of the system. In this paper we propose a light-weight novel thermal management mechanism in the form of intelligent software agent, which monitors and regulates the operating temperature of the CPU cores to improve reliability of the system. We validated our methodology on the Odroid-XU4 SoC and it has been successful to reduce the operating temperature by 6.32% while improving performance by 7.96% and reducing power consumption by 9.45% than the state-of-the-art.</p

    Dynamic Energy and Thermal Management of Multi-Core Mobile Platforms: A Survey

    Get PDF
    Multi-core mobile platforms are on rise as they enable efficient parallel processing to meet ever-increasing performance requirements. However, since these platforms need to cater for increasingly dynamic workloads, efficient dynamic resource management is desired mainly to enhance the energy and thermal efficiency for better user experience with increased operational time and lifetime of mobile devices. This article provides a survey of dynamic energy and thermal management approaches for multi-core mobile platforms. These approaches do either proactive or reactive management. The upcoming trends and open challenges are also discussed

    Predictive Thermal Management for Energy-Efficient Execution of Concurrent Applications on Heterogeneous Multicores

    Get PDF
    Current multicore platforms contain different types of cores, organized in clusters (e.g., ARM's big.LITTLE). These platforms deal with concurrently executing applications, having varying workload profiles and performance requirements. Runtime management is imperative for adapting to such performance requirements and workload variabilities and to increase energy and temperature efficiency. Temperature has also become a critical parameter since it affects reliability, power consumption, and performance and, hence, must be managed. This paper proposes an accurate temperature prediction scheme coupled with a runtime energy management approach to proactively avoid exceeding temperature thresholds while maintaining performance targets. Experiments show up to 20% energy savings while maintaining high-temperature averages and peaks below the threshold. Compared with state-of-the-art temperature predictors, this paper predicts 35% faster and reduces the mean absolute error from 3.25 to 1.15 °C for the evaluated applications' scenarios

    Runtime energy management of concurrent applications for multi-core platforms

    No full text
    Multi-core platforms are employing a greater number of heterogeneous cores and resource configurations to achieve energy-efficiency and high performance. These platforms often execute applications with different performance constraints concurrently, which contend for resources simultaneously, thereby generating varying workload and resources demands over time. There is a little reported work on runtime energy management of concurrent execution, focusing mostly on homogeneous multi-cores and limited application scenarios. This thesis considers both homogeneous and heterogeneous multi-cores and broadens application scenarios. The following contributions are made in this thesis. Firstly, this thesis presents online Dynamic Voltage and Frequency Scaling (DVFS) techniques for concurrent execution of single-threaded and multi-threaded applications on homogeneous multi-cores. This includes an experimental analysis and deriving metrics for efficient online workload classification. The DVFS level is proactively set through predicted workload, measured through Memory Reads Per Instruction. The analysis also considers thread synchronisation overheads, and underlying memory and DVFS architectures. Average energy savings of up to 60% are observed when evaluated on three different hardware platforms (Odroid-XU3, Intel Xeon E5-2630, and Xeon Phi 7620P). Next, an energy efficient static mapping and DVFS approach is proposed for heterogeneous multi-core CPUs. This approach simultaneously exploits different types of cores for each application in a concurrent execution scenario. It first selects performance meeting mapping (no. of cores and type) for each application having minimum energy consumption using offline results. Then online DVFS is applied to adapt to workload and performance variations. Compared to recent techniques, the proposed approach has an average of 33% lower energy consumption when validated on the Odroid-XU3. To eliminate dependency on the offline application profiling and to adapt to dynamic application arrival/completion, an adaptive mapping approach coupled with DVFS is presented. This is achieved through an accurate performance model, and an energy efficient resource selection technique and a resource manager. Experimental evaluation on the Odroid-XU3 shows an improvement of up to 28% in energy efficiency and 7.9% better prediction accuracy by performance models.<br/

    Online concurrent workload classification for multi-core energy management

    No full text
    Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches

    Dataset supporting the article entitled &quot;Online Concurrent Workload Classification for Multi-core Energy Management&quot;

    No full text
    This dataset supports the article entitled &quot;Online Concurrent Workload Classification for Multi-core Energy Management&quot; accepted for publication in ACM/IEEE Design Automation and Test in Europe (DATE), 2017.</span

    Memory and thread synchronization contention-aware DVFS for HPC systems

    No full text
    Due to the operating costs and failure rates of computing platforms, energy efficiency has become a major concern for modern and future many-core systems. In the quest for high performance, the power consumption growth rate must slow down while delivering more performance per unit of power. To improve the energy efficiency of such systems, processors are equipped with low-power techniques such as dynamic voltage and frequency scaling (DVFS) and power capping. These techniques must be controlled carefully as per the workload; otherwise, it may result in significant performance loss and/or power consumption due to system overheads (e.g. DVFS transition latency). Existing approaches [1], [2] are not effective in adapting to workload variations as they do not consider the combined effect of application compute-/memory-intensity, thread synchronization contention, and non-uniform memory accesses (NUMAs) owing to the underlying processor architecture. This poster discusses a workload-aware runtime energy management technique that takes the aforementioned factors into account for efficient V-f control
    corecore